ftp.cs.arizona.edu

home *** CD-ROM | disk | FTP | other *** search

/ ftp.cs.arizona.edu / ftp.cs.arizona.edu.tar / ftp.cs.arizona.edu / icon / newsgrp / group98c.txt / 000016_icon-group-sender _Fri Sep 11 13:09:21 1998.msg < prev next >

Wrap

Internet Message Format | 2000-09-20 | 4KB

Return-Path: <icon-group-sender> Received: from kingfisher.CS.Arizona.EDU (kingfisher.CS.Arizona.EDU [192.12.69.239]) by baskerville.CS.Arizona.EDU (8.9.1a/8.9.1) with SMTP id NAA07475 for <icon-group-addresses@baskerville.CS.Arizona.EDU>; Fri, 11 Sep 1998 13:09:15 -0700 (MST) Received: by kingfisher.CS.Arizona.EDU (5.65v4.0/1.1.8.2/08Nov94-0446PM) id AA32575; Fri, 11 Sep 1998 13:08:48 -0700 To: icon-group@optima.CS.Arizona.EDU Date: 11 Sep 1998 11:18:47 -0700 From: Patrick Scheible <kkt@itchy.serv.net> Message-Id: <iozpc65x88.fsf@itchy.serv.net> Organization: ServNet Internet Services Sender: icon-group-request@optima.CS.Arizona.EDU References: <199809102056.IAA16557@atlas.otago.ac.nz> Subject: Re: Unicode support or support for non-Ascii based character manipulation? Errors-To: icon-group-errors@optima.CS.Arizona.EDU Status: RO Gordon Peterson (http://www.computek.net/public/gep2/) wrote: > Okay, I don't dispute that this move is happening but personally I > still don't very much like it. The fact is that (at least here in the > Western Hemisphere, where probably most of the world's computers are > used) an eight-bit byte is already quite sufficient for most purposes, > and doubling it comes at a cost in complexity and storage (RAM, disk, > tape, whatever) which is simply very, very hard to justify on any > genuine economic basis. ASCII is also NOT adequate for many purposes even in the United States. Almost every word processor has their own incompatible way of representing diacritical marks and characters that were omitted from ASCII. (By the way, did you know that there are other countries in the Western Hemisphere besides the United States? And most of them don't speak English?) I work in a library, and libraries found plain ASCII inadequate all the way back in the early 1960s, when the computer programmers were still bitching about people who wanted lowercase letters. (By the way, the character set libraries adopted does a lot better job accomodating all the roman-alphabet languages than the later ISO standards; pre-composed characters with diacritical marks greatly expand the character set and still leave out some combinations that occure in Roman-alphabet languages.) There's borrowed words with diacritical marks, place names from foreign languages, personal names, quotations from old English. That's not even counting other Roman-alphabet languages. > If other countries have more difficult (or huge) character sets, > that is (while a fact of life) simply an inherent disadvantage > of their culture (and note that I'm not intending that as a slam > or value judgement, it just IS the way it is), and I don't see a > terribly convincing argument why the other countries (without > that disadvantage) ought to pay the price too, just in order to > artificially level the playing field. Many of those non-Roman character sets are no more difficult than Roman. Cyrillic has enough letters to spell the major sounds in its languages, which you've got to admit is a plus. Greek, Hebrew, Arabic, and numerous other alphabets are no harder in themselves than the Roman. Part of what made them a pain to program was that most of the industry and national standards organizations all took it on themselves to make their own 8-bit encodings, so you had to look outside the character string to interpret the bytes in it. Even if you skip the Han character set parts of Unicode, Unicode is a huge blessing in that all the other alphabets have code points within Unicode. The United States is not an island. Closing our eyes and pretending that rest of the world doesn't exist and doesn't buy our software would be a bad idea even if it was possible. If you're concerned about efficiency, maybe you should worry about all the gratuitous graphics. Over uncompressed ASCII, compressed Unicode uses little to no more disk or tape space. Compressing and uncompressing strings adds some complexity, but you get some simplicity by not having to keep track of which character set you're in and switching back and forth between character sets within what is logically one string. -- Patrick Scheible